24 research outputs found
Recommended from our members
Identification of germline variants that predispose to familial melanoma
Melanoma is an extremely aggressive malignancy with a poor prognosis in advanced disease. While GWAS and exome analysis have helped to identify loci linked to the development of the disease, these studies have explained predisposition to melanoma in only a fraction of cases. Thus, the majority of the genetic factors that contribute to the pathogenesis of melanoma are yet to be defined. This project aims at identifying novel genes and pathways involved in the development of familial melanoma, and also identify loci which predispose individuals to disease development.
308 individuals from 133 different families previously diagnosed with melanoma were sequenced through a mixture of exome or whole genome sequencing. Multiple workflows were established to analyse the dataset for novel driver mutations. A novel approach of combining association and linkage analysis was established for the variants in the coding region to identify genes with high burden of mutations where the variants segregated with the disease within the pedigrees. The role of non-coding variants and structural variants in melanoma onset was also investigated through additional workflows in the whole-genome sequenced individuals.
Non-synonymous mutations were found in CDKN2A, BRCA1, POT1 and BAP1. Disruptive variants were also observed in novel genes such as EXO5, TP53AIP and AMER1. An increased burden on variants in transcription factor binding motifs were observed in genes including SYK and SRC. A large deletion upstream of CDKN2A was identified. Genes including ATR and FAT1 were identified to have a higher burden of disruptive variants that segregated with the disease within the cases through the novel combined association-linkage analysis.
Disruptive germline variants that could play a role in familial melanoma development were identified in multiple genes through a combination of several approaches.Marie Curie fellowship under the Melgen ETN networ
Unsupervised Extraction of Representative Concepts from Scientific Literature
This paper studies the automated categorization and extraction of scientific
concepts from titles of scientific articles, in order to gain a deeper
understanding of their key contributions and facilitate the construction of a
generic academic knowledgebase. Towards this goal, we propose an unsupervised,
domain-independent, and scalable two-phase algorithm to type and extract key
concept mentions into aspects of interest (e.g., Techniques, Applications,
etc.). In the first phase of our algorithm we propose PhraseType, a
probabilistic generative model which exploits textual features and limited POS
tags to broadly segment text snippets into aspect-typed phrases. We extend this
model to simultaneously learn aspect-specific features and identify academic
domains in multi-domain corpora, since the two tasks mutually enhance each
other. In the second phase, we propose an approach based on adaptor grammars to
extract fine grained concept mentions from the aspect-typed phrases without the
need for any external resources or human effort, in a purely data-driven
manner. We apply our technique to study literature from diverse scientific
domains and show significant gains over state-of-the-art concept extraction
techniques. We also present a qualitative analysis of the results obtained.Comment: Published as a conference paper at CIKM 201
Sparsity-aware neural user behavior modeling in online interaction platforms
Modern online platforms offer users an opportunity to participate in a variety of content-creation, social networking, and shopping activities. With the rapid proliferation of such online services, learning data-driven user behavior models is indispensable to enable personalized user experiences. Recently, representation learning has emerged as an effective strategy for user modeling, powered by neural networks trained over large volumes of interaction data. Despite their enormous potential, we encounter the unique challenge of data sparsity for a vast majority of entities, e.g., sparsity in ground-truth labels for entities and in entity-level interactions (cold-start users, items in the long-tail, and ephemeral groups).
In this dissertation, we develop generalizable neural representation learning frameworks for user behavior modeling designed to address different sparsity challenges across applications. Our problem settings span transductive and inductive learning scenarios, where transductive learning models entities seen during training and inductive learning targets entities that are only observed during inference. We leverage different facets of information reflecting user behavior (e.g., interconnectivity in social networks, temporal and attributed interaction information) to enable personalized inference at scale. Our proposed models are complementary to concurrent advances in neural architectural choices and are adaptive to the rapid addition of new applications in online platforms.
First, we examine two transductive learning settings: inference and recommendation in graph-structured and bipartite user-item interactions. In chapter 3, we formulate user profiling in social platforms as semi-supervised learning over graphs given sparse ground-truth labels for node attributes. We present a graph neural network framework that exploits higher-order connectivity structures (network motifs) to learn attributed structural roles of nodes that identify structurally similar nodes with co-varying local attributes. In chapter 4, we design neural collaborative filtering models for few-shot recommendations over user-item interactions. To address item interaction sparsity due to heavy-tailed distributions, our proposed meta-learning framework learns-to-recommend few-shot items by knowledge transfer from arbitrary base recommenders. We show that our framework consistently outperforms state-of-art approaches on overall recommendation (by 5% Recall) while achieving significant gains (of 60-80% Recall) for tail items with fewer than 20 interactions.
Next, we explored three inductive learning settings: modeling spread of user-generated content in social networks; item recommendations for ephemeral groups; and friend ranking in large-scale social platforms. In chapter 5, we focus on diffusion prediction in social networks where a vast population of users rarely post content. We introduce a deep generative modeling framework that models users as probability distributions in the latent space with variational priors parameterized by graph neural networks. Our approach enables massive performance gains (over 150% recall) for users with sparse activities while being faster than state-of-the-art neural models by an order of magnitude. In chapter 6, we examine item recommendations for ephemeral groups with limited or no historical interactions together. To overcome group interaction sparsity, we present self-supervised learning strategies that exploit the preference co-variance in observed group memberships for group recommender training. Our framework achieves significant performance gains (over 30% NDCG) over prior state-of-the-art group recommendation models. In chapter 7, we introduce multi-modal inference with graph neural networks that captures knowledge from multiple feature modalities and user interactions for multi-faceted friend ranking. Our approach achieves notable higher performance gains for critical populations of less-active and low degree users
Audience-Centric Natural Language Generation via Style Infusion
Adopting contextually appropriate, audience-tailored linguistic styles is
critical to the success of user-centric language generation systems (e.g.,
chatbots, computer-aided writing, dialog systems). While existing approaches
demonstrate textual style transfer with large volumes of parallel or
non-parallel data, we argue that grounding style on audience-independent
external factors is innately limiting for two reasons. First, it is difficult
to collect large volumes of audience-specific stylistic data. Second, some
stylistic objectives (e.g., persuasiveness, memorability, empathy) are hard to
define without audience feedback.
In this paper, we propose the novel task of style infusion - infusing the
stylistic preferences of audiences in pretrained language generation models.
Since humans are better at pairwise comparisons than direct scoring - i.e., is
Sample-A more persuasive/polite/empathic than Sample-B - we leverage limited
pairwise human judgments to bootstrap a style analysis model and augment our
seed set of judgments. We then infuse the learned textual style in a GPT-2
based text generator while balancing fluency and style adoption. With
quantitative and qualitative assessments, we show that our infusion approach
can generate compelling stylized examples with generic text prompts. The code
and data are accessible at https://github.com/CrowdDynamicsLab/StyleInfusion.Comment: 14 pages, 3 figures, Accepted in Findings of EMNLP 202
ENHANCED REAL-TIME GROUP AUCTION SYSTEM FOR EFFICIENT ALLOCATION OF CLOUD INTERNET APPLICATIONS
Cloud internet applications have recently attracted a large number of users in the Internet. With the invention of these cloud internet applications, it is inefficient to allocate maximum number of participants in real time group auction system. So an efficient approximation algorithm is proposed with the improved combinatorial double auction protocol. It is developed to enable different kinds of resource distribution among multiple users and providers. At the same time it includes more number of participants in an auction. Due to the NP-hardness of binary integer programming for resource distribution in a real time group auction system, the improved approximation algorithm is proposed to deal with np-hardness and to obtain the optimal solution. Participant honesty is necessary to ensure auction trustfulness
Bayesian identification of bacterial strains from sequencing data
Rapidly assaying the diversity of a bacterial species present in a sample obtained from a hospital patient or an environmental source has become possible after recent technological advances in DNA sequencing. For several applications it is important to accurately identify the presence and estimate relative abundances of the target organisms from short sequence reads obtained from a sample. This task is particularly challenging when the set of interest includes very closely related organisms, such as different strains of pathogenic bacteria, which can vary considerably in terms of virulence, resistance and spread. Using advanced Bayesian statistical modelling and computation techniques we introduce a novel pipeline for bacterial identification that is shown to outperform the currently leading pipeline for this purpose. Our approach enables fast and accurate sequence-based identification of bacterial strains while using only modest computational resources. Hence it provides a useful tool for a wide spectrum of applications, including rapid clinical diagnostics to distinguish among closely related strains causing nosocomial infections. The software implementation is available at https://github.com/PROBIC/BIB.Peer reviewe